Using pdf2Data Rest API
This example uses configuration values for the RESTful service that can be set up during container deployment.
We assume that pdf2Data REST API Engine is available on localhost:8080, and the authorization token is "AUTH_TOKEN". Please replace these values with ones you have configured
As for other engine types, one who wants to use REST API service should perform 4 operations:
Upload license
It must be done once per service per license.
curl -X 'POST' \
https://localhost:8080/api/v2/license' \
-H 'accept: application/json' \
-H 'Authorization: Bearer AUTH_TOKEN' \
-H 'Content-Type: multipart/form-data' \
-F 'licenseFile=@pdf2data_license.json;type=application/json'
Register template
Register template in Engine (once per template per instance)
curl -X 'POST' \
'https://localhost:8080/api/v2/templates' \
-H 'accept: application/json' \
-H 'Authorization: Bearer AUTH_TOKEN' \
-H 'Content-Type: multipart/form-data' \
-F 'templateArchive=@template_for_sdk.p2d'
Response:
{
"id": "templateID",
"name": "template_for_sdk",
"description": "test"
}
In the next step, we will use this id: "templateID"
to specify which template must be used for parsing:
You should use "processed" templates. (i.e. one that has a *.p2d extension).
You can get a "processed" template by clicking on the "Download for SDK" button in
pdf2Data UI with the Manager component
Recognize
The nature of the REST Engine is asynchronous.
Whenever you want to process pdf using pdf2Data engine, you should:
Schedule a recognition job
PDF:
curl -X 'POST' \
'https://localhost:8080/api/v2/jobs' \
-H 'accept: application/json' \
-H 'Authorization: Bearer AUTH\_TOKEN' \
-H 'Content-Type: multipart/form-data' \
-F 'pdf=@FileToParse.pdf;type=application/pdf' \
-F 'jobRequest={
"jobType": "RECOGNIZE",
"templateId": "templateID",
"preprocessingType": "NONE"
}'
Image or scanned PDF:
curl -X 'POST' \
'https://localhost:8080/api/v2/jobs' \
-H 'accept: application/json' \
-H 'Authorization: Bearer AUTH\_TOKEN' \
-H 'Content-Type: multipart/form-data' \
-F 'image=@FileToParse.png;type=image/png' \
-F 'jobRequest={
"jobType": "RECOGNIZE",
"templateId": "templateID",
"preprocessingType": "OCR"
}'
Where:
jobRequest.jobType
- the type of the job which should be performed:- use
RECOGNIZE
for actual recognition, note that this will consume the license volume; - you can also preliminary run
CHECK
to verify if the result counts are expected, this call won't affect license volume.
- use
jobRequest.templateId
- id of a template registered in the engine (see point 2.)jobRequest.preprocessingType
- preprocessing type for document. Can beNONE
orOCR
You receive job ID in the response:
Response (PDF)
{
"jobId": "jobID",
"jobType": "RECOGNIZE",
"jobStatus": "QUEUED",
"templateId": "templateID",
"pdfName": "FileToParse.pdf",
"imageName": null,
"errors": [
"string"
],
"preprocessingType": "NONE"
}
Response (image or scanned PDF)
{
"jobId": "jobID",
"jobType": "RECOGNIZE",
"jobStatus": "QUEUED",
"templateId": "templateID",
"pdfName": null,
"imageName": "FileToParse.png",
"errors": [
"string"
],
"preprocessingType": "OCR"
}
Get recognition result
curl -X 'GET' \
'https://localhost:8080/api/v2/jobs/jobID/result?outputFormat=JSON' \
-H 'accept: */*' \
-H 'Authorization: Bearer AUTH_TOKEN'
Where:
outputFormat
- the file format in which extracted data must be presented:JSON
,XML
,JSON_WITH_META
,XML_WITH_META
.
This call returns the response with extracted values in the specified format.